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ABSTRACT 

In research, data sets often occur in which the 
variance of the distribution of the dependent variable at given 
levels of the predictors is a function of the values of the 
predictors. In this situation, the use of weighted least-squares 
(WLS) or techniques is required. Weights suitable for use in a WLS 
k*egression analysis must be estimated. A variety of techniques have 
been proposed for the empirical selection of weights with the 
ultimate objective being a better "fit." The outcomes of the analysis 
must be interpreted once the fitting is complete. Problems can arise 
in the interpretation of some of the statistics when using a computer 
package. In this paper, such problems in the application and 
interpretation of WLS regression using the SPSS statistical package 
are demonstrated, both algebraically and by example. For the purposes 
of the ei^ample, an artificial data set (whose underlying parametric 
structure is known) has been created. Each of the statistics commonly 
reported in the WLS regression analysis of such a data set are 
isolated and their interpretation discussed. Where necessary, 
adjusted statistics that more reasonably represent the outcomes of 
the analysis are proposed and their use illustrated. (BAE) 
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INTERPRETINB THE RESULTS OF WEIGHTED LEAST-SQUARES REGRESSION! 
CAVEATS FOR THE STATISTICAL CONSUMER 



Qui te f requentl y i n educati onal ^ psychol agi cal and 
sociological research, datasets occur in which the variance of the 
distribution of the dependent variable at given levels of the 
predictors is a function of the values of the predictors. In this 
situation, the use of weighted, rather than ordinary, least-squares 
techniques is required in the fitting of regression models (Draper 
8/. Smith , 1981) - 

Typically, weights suitable for use in a weighted least*-- 
squares (WLS) regression analysis are not known in advance and must 
be estimated in situ by "a combination of prior knowledge, 
intuition and evidence" (Chatter jee S< Price, 1977, p. 101) • A 
variety of techniques have been proposed in the statistical 
literature for the empirical selection of the weights, ranging from 
strategies that incorporate substantive knowledge of the form of 
the residual variance as a function of the predictors (Miller, 
1986) to two-stage strategies in which an initial unweighted (QLS) 
analysis is used to inform the selection of weights (for instance, 
biweighting in hosteller ?< Tukey, 1977). Whatever the selected 
approach, the ultimate objective is to achieve a "better'' fit in 
that "while the Cordinary] 1 east -squares estimates and fit may be 
satisfactory, the precision of the Cordinary] least-squares 
estimates may be different from that indicated under standard 



a«»sumptions" (Cox S< Snell, 1981, p. 83) « 

Of course, regardless of the manner in which the empirical 

weights have been selected, there remains the question of 

interpreting the outcomes of the analysis once the fitting is 

complete. It is during this interpretation that the consumer can 

be lead wildly astray by the output from a computer package such as 

SPSS^. For some of the computed statistics (such as the estimated 

slopes) there is no problem- However, pitfalls can arise in the 

i nterpretation of other st at i sties for three reasons. First , by 

virtue of the manner in which the empirical weighting must be 

X 

applied in the WLS regression by SPPS , several important 
statistics <i«e«, the standard errors associated with the slope 
estimates, elements of the regression ANOVA table , the root mean-^ 
square error and related statistics) are likely to be incorrect- 
Second, because the regression statistics created in a WLS analysis 
are expressed in the metric of the weighted variates, it is not 
immediately obvious haw even those statistics which have been 
computed correctly (i.e., the coefficient of determination) should 
be interpreted. Third, even though an optimal set of weights may 
have been selected, many of the important statistics and 
diagnostics Ci«e«, the stand^^rd errors and associated t-'stati sties,, 

the reqressi on ANGVA sums-of •-■squares , mecin-squar es , F stat i st i c end 

other related statistics) may not be invariant under multiplication 
of the=? weights by a constant a process which modifies the 
measurement metric in the weighted world and raises questions about 
how the empiricc\l weights might optimally be scaled. 

In this paper, such pitfalls in the application and 
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interpretation of WLS regression using the SPSS" statistici^l 
package are demonstrated, both algebraically and by example. For 
the purposes of the ej-iample, an artifical dj^taset (whose underlying 
parametric structure is known) has been created. Each of the 
statistics commonly reported in the WLS regression analysis of such 
a dataset are isolated and their interpretation discussed- Where 
necessary, adjusted statistics that more reasonably represent the 
outcomes of the analysis are proposed and their use illustrated- 



WEI6HTED LEAST-SQUARES 



As hosteller {!< Tukey (1977, p. 346) suggest, the action of 
assigning ''different weights to different observations, either for 
objective reasons or as a matter of judgement" in order to 
recognise "some observations as "better" or "stronger" than others" 
has an extensive history. Whether the investigator wishes to 
downplay the importance of datapoints that are intrinsically more 
variable at specific levels of the predictor variables, or eimply 
to decrease the effect on the fit of remote datapoints, the 
strategy is the same. 

Although the results of this paper are easily general i r-abl e to 
the multiple predictor case, the discussion presented here deals 
with the estimation of the relationship between a dependent 
variable and a single predictor- We will assume that observations 
or measures on two related variables, Y and X, have been obtained 
from a random sample of n independent subjects cHid that the 
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relationship between these two variables in the population is given 
bys 

where and are the unknown intercept and slope parameters to 

be estimated* Furthermore, we will assume that the are 

unobserved random errors which are normally distributed with zero 

2 2 

mean and variance C-X • Thus, the random errors are 

heteroscedastic and the typical DLS strategy for estimating 
2 

and cr^ is inefficient <Neter et al . , 19B5) . 

Typically, an empirical re?sponse to the inefficiency of the 
OLS estimation involves the creation of a set of weights, w^ , which 
4^r© inv©n«©ly proportional to the squared magnitudes of the 

observed X^. These w^ are then applied in the re-fitting of the 
regression model by weighted least-squares. Of course, in 
practice, it is unlikely that the functional dependence of the 
heteroscedastic error variance on the will be known exactly. 
However, in an empirical analysis, the error structure is usually 
inferred from a "combination of prior knowledge, intuition, and 
evidence" (Chatter jee S< Price, 1977, p. 101). Often, the required 
evidence is obtained by inspection of residuals created in an 
initial unweighted (OLS) regression analysis (Neter et al . , 1985., 
p. 170). 
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Equations for the WLS estimates 

Providing the w. are known, the more efficient WLS estimates 
of and P'^ , their sampling variances, and 0*^ can be estimated by 
direct minimization of the sum of the squared weighted residuals 
(for instance, see Neter et al . , 19B5, pp. 167-170). Equivalent 
results can also be obtained by transformation, in which the 
original variateB are multiplied by the square-roots of the w^ 
(Neter et al . , 1985, pp. 171-172). 

Whatever the method of est i mat i on , of parti cul ar i nterest are 
the estimates of §^ and (2.^ (all summations taken over the index 
i 1 , . « • , n ) : 
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CEw. Y. ) - P^CSw. X. ) 



C3D 



and their estimated sampling variances (standard errors): 



1 1/2 



s . e . ( '.) = tr^ 
1 t 
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where the mean-sc,^;are error c^, an unbiased estimate o-f the 
variance of the is estimated from the sum of the weighted 

squared residual ss 



^2 



2w. (Y.- Y. )^ 



n 
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and Y. is the predicted value of Y^ obtained in the WLS analysis. 

2 

The coefficient of determination, R , is also estimated in the 
transformed world. It is a measure of the proportion of the 
variation in weighted Y that can be accounted for by wei^ghte^^ X« 
Its estimation is based on the sum of the weighted squared 
residuals in comparison to the sum of the squared deviations of the 



weighted Y^ about their (weighted) meani 



Ew. (Y. - Y. ) 
11 1 



Ew. 



Y. - 
1 



Ew. Y. i2 
11 



£w. 



i:7D 



Notice that each of these WLS estimators in Equations i:23 
through C7D is essentially equivalent to the corresponding OLS 
estimator, except that the WLS results have been obtained in a 
transformed "world*' in which each point in the dataset has been 



weighted by the appropriate w.. . If all the w^ are set equal to i, 
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then the simpler OLS estimators can easily be recovered- 
Scaling the weights 

Other than simply choosing the forim of the weights by 
examining an empirical residual plot (perhaps obtained in an 
initial OLS regression analysis of the same data) and estimating 
the functional dependence of the on X, the absolute magnitude or 
s cale of the weights must also be decided. At first glance, simple 
logic might suggest that the multiplication of all the 
simultaneously by the same numerical constant would not influence • 
the outcomes of the analysis. Notice, for instance, that in the 
estimation of P^.^, (3^ and R"""* (Equations Z21 C33 and C73) the 
multiplication of the w^ by such an arbitrary constant does not 
influence the estimates obtained in the WLS regression because of 
cancellation of the constant in the numerators and the denominators 
of these equations. Thus it seems that, given the necessary 
functional dependence of the w^ on , the absolute magnitude of 
the weights is unimportant. 

However, in the estimation of the mean-square error (Equation 
C6II), no such cancellation occurs and the estimation depends upon 
the scaling of the weights. In particular, if all the weights are 
doubl ed then the mean-square error i s quadrupl ed ( and the root 
mean-square error is doubled). This is not entirely unexpected, 
since the mean--square error is being computed in the metric of the 
transformed world, and this metric is affected by the application 
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o-f an arbitrary mul t i pi i €3r « N©verthelet»2>, th^ ad-hac inflation of 

the error estimates by the arbitrary manipulation of scale is 

somewhat disconcerting in the sense that what is being estimated 
2 

here — cr^, a population parameter of fixed value — is not 
fluctuating with the selection of arbitrary global magnitudes for 
the weights. 

On the other hand, although the root mean-square error appears 
as a multiplier in expressions for the standard errors of (S^^ and 
P'j^, these latter estimates of precision are nq^t affected by the 
rescaling of the w^ . Even though 0^ may double when the weights 
are arbitrarily doubled, inspection of Equations C43 and C53 in 
conjuction with Equation C63 reveals that the multiplying constant 
cancels out leaving the standard errors of P^-^ and unchanged- 
Notice however, that if there is a failure of the estimation of cr? 
for some reason, then the standard errors in Equations C43 and C53 
will also be incorrect — as is revealed later, this is exactly 
what happens when SPSS''^ REGRESSION is used to fit the model in 
Equation C13 using the WLS approach. 

In essence, viable weighting schemes act to dpwnplaY the 
effect of remote datapoints in the estimation process, and 
therefore it is as though "outliers" are being "removed" (or at 
least "diluted") by the weighting. Intuitively, this constitutes a 
narrowing or focusing of the point cloud around the regression 
line. Consequently, we would expect a reduction in the maqnitudes 
of ti.e standard errors associated with the parameter estimates 
(i.e«, an increase in the precision of the estimation) under the 
WLS regression strategy. This is exactly the effect desired of the 
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WLS fitting process and, providing the selected weights have the 
appropriate dependence on the X., is independent of the scaling of 
the . However, the change in 0*^ as a consequence of multiplying 
the by a constant is somewhat disconcerting; the root mean-* 
square error is intended to estimate 0^, a parameter that is fixed 
in the population! Thus it seems that, although the scaling of the 
w. is unimportant in the estimation of P'^^, and their standard 
errors, it is only when w. equals that cr^ is estimated 

appropriately. As is revealed later, this conflicts with a 
strategy (of mul ti plyi ng the w. by n/Ew. ) that will be proposed in 
order to rectify other problems arising when SPSS^ REGRESSION is 

used for the estimation. 

2 

Finally, since R is estimated in a trans£gr^me^d^^ dataset in 
which the effects of remote datapoints have been '*di luted" in the 
estimation process, the obtained coefficient of determination is 
bound to rise when WLS fitting is used. Thus, the estimate of R""'' 
obtained unthinkingly under WLS regression is frequently much 
larger than the value obtained under the* corresponding OLS fit. To 
the naive consumer of computer output, this apparent i ncrement--to-- 
R'^ can represent a considerable improvement in fit and tends to be 
prominently displayed in any account of the analysis, whereas 
closer inspection reveals that the increment simply reflects the 
extent to which outlying datapoints have been '^'trimmed" from the 
dataset during weighting. In reality, in terms of the original 
point cloud, R"*^ cannot increase in a transition from OLS to WLS 
regression analysis because the c\ct of fitting by QLS serves to 
minimise the sum of squared distances (parallel to the ordinate) of 
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the observed datapointB from the fitted line with consequent 

2 

maximization of R . In order that confusion be dispelled, the data 
analyst should re-interpret the goodness-of -f i t of the WLS 
regression in the original metric, not in the transformed world. A 
suitable technique is described subsequently. 

WEIGHTED LEAST-SQUARES RESRESSIQN USING SPSS^ 

In this section, the SPSS^' REGRESSION procedure is used to 
analyze a sample of artificial data whose parametric structure is 
known. First, the structure and creation of the sample of 
artificial data is described. Second, the fitting of the 
statistical model in Equation CI 3 using SPSS'*^ REGRESSION is 
outlined. Third, the outcomes of the various OLS and WLS analyses 
are contrasted, and specific miscalculations and inaccuracies are 
noted and suitable adjustments proposed. 

The data 

F^or the purposes of this paper, a bivariate sample of 50 
observations on the pair pf variables CY. , X. 3 were randomly 
generated such that: 

Y. 2 4. .5X. €. , 
1 11^ 
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where the £^ were drawn from a normal distribution with zero mean 

2 

and variance 0«04X^. Thus, in the hypothetical population from 
which this sample was drawn, the functional relationship between Y 
and X. has an intercept §^ of magnitude 2, a slope of magnitude 
•5, and the random errors are heteroscedasti c with variance «04Xt« 
The sample data are displayed in Figure C13, where a fan-shaped 
scatterplot typical of this type of heter oscedast i ci ty is evident. 



Insert Figure C13 about here 



Fitting the statistical model 

SPSS^' REGRESSION was used to fit the statistical model in 
Equation cn to the data displayed in Figure CI 3. Both OLS and WLS 
regression strategies were applied. An additional weighting 
variable was created with the COMPUTE statement to contain the w. 
<Be>e below), and the WEIGHT command was used to indicate this 
variable to SPSS*. This approach, which can be used to weight 
almost any statistical procedure in the SPSS'*' package, causes 
individual cases in the dataset to be arithmetically replicated. 
Then, rather than performing a WLS regression by applying Eguations 
L21 through Z71 in the original dataset, the package isimply r\j.hs an 
OLS regression on the new arithmetically-modified dataset and 
assumes that appropriate estimates will be produced. As i s 

13 



described below, this assumption is largesly unjustified. 

For the purposes of the current demonstration, three sets of 
(supposedly equivalent) weights were created. Each of these sets 
of weights is proportional to the squared inverse of the value of 
the independent variable- The three rsets of weights differ only in 
their scale — any given set of weights being simply a constant 
multiple of any other set of weights- Commonsense might lead us to 
believe that the arbitrary choice of scaling factor would make no 
difference to the outcomes of a particular WLS regression. 
However, as is shown below, this is not the case — the specific 
choice of the constant used as multiplier to create a set of 
weights i s of crucial importance to the correct interpretation of 
the findings of the WLS regression. Thus, the weighting tsch^mes 
included the basic set of weights: 



w 



li 



A set in which each weight was double the corresponding weight in 
the basic set above: 



w 2 w 

2i ^li 
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and a set -for which the sum of the weighted number of cases equals 
the original sample size (Moser ?< Kalton, 1972): 
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n 

E w 



li 



w 



li 
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The -fitting of the statistical model in Equation [13 was carried 
out four times: once using OLS regression, and three times using 
WLS regression (once for each of the sets of weights presented in 
Equations [83 through [103). Excerpts from the obtained regression 
results are presented in Exhibits [13 through [43, 



Insert Exhibits [13-[43 about here 



Summarizing and comparing the obtained fits 

The fits obtained in Exhibits [13 through [43 are summarised 
in Table [13, also included are hand-calculated estimates obtained 
by applying Equations [23 through C73 directly. All estimates 
which have been computed correctly, according to Equations C23 
through 171, have been printed in boldface in Table C 1 3 . What is 
i mme?di atel y obvious (and rather alarming!) is that there is very 
little agreement between the estimates obtained by SPSS'* and the 
correct estimates obtained by hand. Estimation of each of the 
parameters is discussed briefly below. 
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Insert Tablfc CI 3 about here 



Esti mated in tercept and slope. From Table C13 note that all 
three o-f the SPSS^-computed WLS estimates of P'^^ are equal to the 
hand-computed estimate, regardless o-f the particular set of weights 
applied. The -four WLS estimates of also agree exactly- In 
addition, the OLS estimates of and §^ are arithmetically very 
close to the obtained WLS estimates and neither set of estimates is 
very far from the known underlying population values. This is not 
en'. ly unexpected as the OLS and WLS estimators are both 
unbi ased . 

Esti mated gtgndgrd prrprsu The principal objective of WLS 
regression, applied in the context of heteroscedastic errors, is to 
obtain superior estimates of the precisions of p^.^ and « In this 
context, it is disturbing to report that the standard errors appear 
to depend upon which particular set of weights was applied- Notice 
that SPSS'^* REGRESSION was unable to obtain a correct estimate of 
the standard errors under neither of the first and second sets of 
weights, the correct estimates being obtained only under the third 
set of weights and by hand---calculation. This is particularly 
disconcerting because it is the first set of weights, the , that 
are the natural first choice of the data-analyst in a situation 
such as this. 

This fluctuation of the standard errors of and ^.j^ as the 
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regresBion vgeights are rescaled is doubly disturbing when the 

earlier argument (centering on Equations C43 and LSI) is recalled. 

Earlier it was argued that the estimation of precision would be 

independent of any re-scaling of the weights because the w^ 

appeared equally in both the denominators and the numerators of 

Equations C4D and C5] (by virtue of appearing in the numerator of 

0*^) . And yet, in Table C13, we see quite clearly and unexpectedly 

that the standard errors of and are doubling when the w^^^ are 

replaced by the ^2 • ■ The reason for this peculiar and unexpected 

fluctuation is largely dependent upon the failure of SPSS^' 

2 

REGRESSION to estimate O'Z correctly. 

Est i m at ed error var i anc e . Notice that, in the regression 
ANOVA tables of Exhibit C23 through C43 , the degrees-of -freedom 
associated with both the error and total sums-of '--squares vary with 
the set of weights applied. Thus, in Exhibit C23 , the estimation 
c^ppears to have been performed under the mis-apprehension that 
there were 306 subjects in the sample rather than 50, and in 
Exhibit C33 more than one thousand additional datapoints have 
apparently joined the existing point cloud! It is only when the 
third set of weights, the , are applied in Exhibit C4] that the 
degrees-'of -freedom are correct. This unlikely fluctuation of the 
degrees-of -freedom with the selection of different sets of weights 
is a consequence of the algorithmic strategy used by SPSS^' to fit 
the WLS regressions, in which individUc<l cases in the dataset were 
^■Ci t h ("g t i 9Al ky r3Pl±£.§^.%,^?Si »"c<ther than applying Equations C23 
through Z71 directly. 
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The principal -failure o-f the arithmetic replication strategy 
2 

is apparent when is estimated. Thus, rather than correctly 
applying Equation C6], SPSS^ has based its estimation on the 
equation below: 



.-so 



^^^^^ ^'^^ 
Ew. - 2 



C113 



where the sum o-f the weights has replaced the sample size in the 
denominator o-f Equation C6]« The numer ator o-f this new estimator 
can only be computed appropriately when the -first set o-f weights, 
the Wj^^, are applied, whereas the denominate is only correct when 
the third set o-f weights, the w-.^ , are applied. Consequently, as 
is evident in Table CI], (J^ is never estimated correctly by SPSS'^ 
regardless of the set o-f weights selected! This -failure however, 
can be rectified by ^dji^stin the estimate o-f obtained under the 
w-p.. . In this case, an approriate estimator of the error variance 
is given bys 



< - 





>: 







Sw^. (Y. - Y. y 

O'l 1 1 
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and therefore the estimate of O'Z obtained under w~r. can be 

t .3 1 

corrected by multiplying by ( Sw^ . /n (306/50 ) to give .0381, a 
value which equals the value of the hand--comput ed estimate in Table 
C13. 
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Estimat i ng the coefficient of determination , As noted 

2 

earlier, although all of the WLS estimates of R in Table Zll agree 

and are all correct according to Equation C73, none of these 

estimatiSB are truly appropriate for describing the empirical 

2 

goodness-of-f it . Recall that R has been estimated in the 

transf ormed dataset in which the effect of remote datapoints has 

been "diluted" during estimation, and therefore the obtained 

coefficient of determination is necessarily inflated. A more 

informative measure of empirical goodness-of -f i t can be computed by 

comparing the Y. predicted under the WLS fit and the observed Y. in 

the 9rA9,lJDL§.L metric, not in the transformed world. An equation 

2 

suitable for computing such a pseudo^^^^ estimate can be obtained by 
a simple adjustment of Equation C73: 



ECY. - Y. ) 



EY, 



E Y.^ 



2 



C133 



where the Y. are the predicted values of the dependent variable 
obtained under the WLS fit, and are independent of which of the 
three sets of weights in Equations CSH through CIO] are applied. 
In the current appl i cat i on the value of thi s pseudo-R*^ stat i stic is 
.5108 — slightly less than the QLS estimate of .5120 as we would 
have expected, given the maximization of R'^* in an OLS fit. 
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RECOMMENDATIONS 

As is evident in Table CI 3, the SPSS^^ REGRESSION procedure is 
spectacularly incorrect in its -fitting of a simple linear 
regression model by weighted least-squares. The magnitudes of many 
^^-f the obtained estimates depend strongly upon the absolute 

jnitudes of the weights used in the WLS fit and, in addition, 
several of the crucial reported outcomes are just plain wrong. 
This paper has explored these inaccuracies, both algebraically and 
by example, and has suggested a variety of fix-ups that can be 
easily applied in practice. 

In particular, in selecting suitable weights for application 
in a WLS regression with SPSS \ the most successful weights are 
those presented in Equation C103. These latter weights have been 
adjusted prior to application by taking the theoretically- 
appropriate weights of Equation CS3 and re-scaling them so that 
their sum is equal to the original sample-size. However, even the 

application of these re-scaled weights is not entirely without 

2 

problem- Specifically, the estimation of 0^^ continues to be 

2 

incorrect and the estimation of R , while not incorrect, leads to 
an inflated representation of the empirical gaodness-of-f i t which 
is misleading at best. Simple and easi 1 y--appl ied adjustments to 
correct both of these estimators are presented in Equations C123 
and C133 respectively. 

Finally, this paper has considered only a few of the 
statistics that are commonly interpreted in a typical regression 
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analysis- Furthermore, although many oi our results are easily 
general iaable to the case of multiple linear regression using WLS, 
we would advise empirical researchers to be very cautious in all of 
their interpretations in this latter instance. In particular, 
although we have not investigated the manner in which more complex 
and sophisticated statistics such as Mallow's , Cook's D and the 
Hat matrix are affected by an arbitrary re-scaling of regression 
weights, it would certainly seem appropriate to advise great 
caution in their interpretation too. The empirical application of 
weighted least-squares regression analysis using SPSS^* would 
certainly seem to be a case of "caveat emptor"!! 
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EXHIBIT ONE 






UNWEIGHTED <OLS) REGRESSION 




2 

Coef-ficient o-f Determination, R .5120 




Root Mean-Square 


Error , 1421 




Analysis o-f Variance 






Sum ^ 
df of J^^^'' 
Squares ^^^^^"^^ 




Source 


F 


Model 


1 1.0161 1.0161 


50.351 


Error 


48 . 9686 . 0202 




Total 


49 1.9847 




Variables in the 


Equation 




Parameter 


Estimate Standard t 

Error 


-statistic 
(Ho: (3=0 ;> 




1 . 9853 . 0505 


39 . 305 




.4974 .0701 


7. 096 
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EXHIBIT TWO 






WEIGHTED (WLS) REGRESSION 








1 










1 






Coe-f f icient of Determination, R"^ 




.6737 




Root Mean-Square 


Error 




.0776 




Analysis of Variance 








Source 


Sum 

d-f of 

Squares 




Mean 
Square 


F 


Model 


1 3.7747 




3.7747 


627.667 


Error 


304 1 . Q2B2 




. 0060 




Total 


305 5.6029 








Variables in the 


Equation 








Parame^ter 


r«4.^„^4.„ Standard 
c-stimate 

Error 


t --stat i Stic 
(Ho: (Si. ==0:1 




1.9977 


. 0077 


260. 630 




.4751 


. 0190 
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EXHIBIT THREE 
WEIGHTED (WLS) REGRESSION 

= 2Wj. 



Coe-fficient of Determination, R*^ 


.6737 




Root Mean -Square 


Error 


. 0774 




Analysis of Variance 






Source 


Bum 

df of 

Squares 


Mean 
Square 


F 


Model 


1 15.0987 


15.0987 


2523-056 


Error 


1222 7.3128 


. 0060 




Total 


1223 22-4115 






Variables in the 


Equation 






Parameter 


. . . Standard 
Estimate 

Error 


t"-Btati st i c 
(Hi>: f:.=o:) 




1.9977 


« 0038 


522.543 




.4751 


. 0095 


50.230 
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EXHIBIT FOUR 
WEIGHTED (MLS) REGRESSION 



w 



3i 



n 



n 

E w 



li 



li 



Coefficient of Determination, R' 
Root Mean-Square Error 



. 6737 
. 0790 



Analysis o-f Variance 



Source 



df 



Sum 
of 

Squares 



Mean 
Square 



Model 
Error 
Total 



1 

48 
49 



.6168 
.2987 
1 .9155 



.6168 

, 0062 



99. 105 



Variables in the Equation 



Parameter 


Est i mate 


Standard 
Error 


t-stati sti c 
(Hoe §«0:) 




1.9977 


.0193 


103.564 




.4751 


. 0477 


9.955 
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Table Is Summary statistics from the -four OLS and WLS regressions 
estimated by SPSS REGRESSION in Exhibits 1 through 4, with 
accompanying correct estimates obtained by hand-calculation using 
Equations C23 through C73. 



Estimate 



OLS 



w. 



WLS 



SPSS^-calculated 



0» 



Hand 
Calc 



^'2 



.5120 
. 0202 



-6737 

. 0060 



.6737 



0060 



-6737 



. 0062 



-6737 



-0381 



1 . 98S3 



s.e. CP^._j) .0505 



39 . 305 



1 . 9977 
. 0077 

26r 630 



1 . 9977 



. 0038 



1 . 9977 



.0193 



1 . 9977 



.0193 



522.543 103.564 103.564 



A 



B.e. § J ) 



.4974 
. 070 1 
7.096 



.4751 

. 0190 
25.053 



.4751 



, 0095 



50.231 



.4751 



.0477 



9.955 



.4751 



.0477 



9.955 



Known parameter values o-f , and P. are .04, 2, and .5 
respectively. " 
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FIGURE CAPTIONS 



Figure 1: Bivariate scatterplot of the arti-ficial dataset. Values 
of the dependent variable plotted against values of the 
independent variable , for i = 1, n. 
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